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Abstract — The problem of designing good Space-Time Block 
Codes (STBCs) with low maximum-likelihood (ML) decoding 
complexity has gathered much attention in the literature. All the 
known low ML decoding complexity techniques utilize the same 
approach of exploiting either the multigroup decodable or the 
fast-decodable (conditionally multigroup decodable) structure of 
a code. We refer to this well known technique of decoding STBCs 
as Conditional ML (CML) decoding. In this paper we introduce a 
new framework to construct ML decoders for STBCs based on 
the Generalized Distributive Law (GDL) and the Factor-graph 
based Sum-Product Algorithm. We say that an STBC is fast GDL 
decodable if the order of GDL decoding complexity of the code 
is strictly less than M^, where A is the number of independent 
symbols in the STBC, and M is the constellation size. We give 
sufficient conditions for an STBC to admit fast GDL decoding, 
and show that both multigroup and conditionally multigroup 
decodable codes are fast GDL decodable. For any STBC, whether 
fast GDL decodable or not, we show that the GDL decoding 
complexity is strictly less than the CML decoding complexity. For 
instance, for any STBC obtained from Cyclic Division Algebras 
which is not multigroup or conditionally multigroup decodable, 
the GDL decoder provides about 12 times reduction in complexity 
compared to the CML decoder. Similarly, for the Golden code, 
which is conditionally multigroup decodable, the GDL decoder 
is only half as complex as the CML decoder. 



I. Introduction 

THE complexity with which a Space-Time Block Code 
(STBC) can be maximum-likelihood (ML) decoded is an 
important parameter from an implementation point of view. 
Consequently, the problem of designing codes with high rate 
and good error performance that admit low complexity ML 
decoding is of much interest in the literature. This problem 
was first attacked by constructing multigroup decodable codes 
which have the property that the information symbols of the 
code can be partitioned into several groups, and each group 
of symbols can be ML decoded independent of other symbol 
groups. Examples include the Orthogonal Designs lHJ-llI 
and the higher rate multigroup decodable STBCs constructed 
in H-IHl. In fT6l|, it was shown that a new class of STBCs 
called fast-decodable or conditionally multigroup decodable 
codes allow reduced complexity decoding as well. These codes 
contain a lower rate multigroup decodable STBC as a subcode, 
and this property is leveraged to decode such STBCs with low 
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complexity. Examples of fast-decodable codes available in the 
literature include [17|-[24J, the Silver code |25J, [26] and the 
Golden Code EtI-I^II, [fsl. All known low complexity ML 
decoders have the same unified approach of exploiting either 
the multigroup decodability or the conditional multigroup 
decodability of a code. This method is well known and widely 
used in the literature, and we will refer to it as Conditional 
ML (CML) decoding. 

The Generalized Distributive Law ll30ll and its equivalent, 
factor graph based approach, known as the Sum-Product 
Algorithm [31 J are message-passing algorithms that efficiently 
solve a class of computation problems called Marginalize 
a Product Function (MPF) problems. The Generalized Dis- 
tributive Law (GDL) includes as special cases the Viterbi's 
algorithm the BCJR algorithm (33], the Fast-Fourier 

Transform ||34| . the Turbo lf35l and LDPC decoding algo- 
rithms [36), ll37l . In this paper, we first identify that the ML 
decoding problem of any STBC is equivalent to the problem 
of minimizing a multivariate, second degree real polynomial, 
where the variables assume values from a finite signal set. 
Using this observation we show that the ML decoding of any 
STBC is an MPF problem, and hence, the GDL is a natural 
choice for constructing low complexity ML decoders. The 
contribution and organization of this paper are as follows. 

• We introduce a new, GDL based framework to design ML 
decoders for STBCs (Section|III]and Section lTV-Ab . Since 
the GDL is computationally efficient, this new framework 
provides a rich scope for designing low complexity ML 
decoders. 

• We show that the GDL decoding complexity of any 
code is strictly less than its CML decoding complexity 
(Theorems |2] and [3] Section IV-Bb . As an application of 
our results, we show that for any STBC obtained from 
Cyclic Division Algebras |38| which is not multigroup 
or conditionally multigroup decodable, the GDL decoder 
is approximately 12 times less complex than the CML 
decoder In case of the Golden code, which is condition- 
ally multigroup decodable, the GDL decoder is roughly 
half as complex as the CML decoder (Example G.4, Sec- 
tion IV-Cb . The GDL can lead to reductions in the order 
of decoding complexity as well, when compared to the 
CML decoder We give explicit examples of two classes 
of STBCs, the Toeplitz codes |39| and the Overlapped 
Alamouti Codes |j40), where the GDL decoder has a lower 
complexity order than the CML decoder (Section [V-BI) . 

• We give sufficient conditions for a code to be fast GDL 
decodable i.e., to admit low complexity GDL decod- 
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ing, and show that both muhigroup and conditionally 
multigroup decodable codes are amenable to fast GDL 
decoding (Section FlV-BI ). Using the new GDL framework 
we also provide tools to readily identify multigroup and 
conditionally multigroup decodable codes (Section FlV-BI ). 
• When the information symbols of a code are encoded 
using a RAM signal set, we show that the GDL algorithm 
can exploit the structure of PAM to lead to further 
reduction in decoding complexity (Section IV-Cb . 
A brief review of the GDL is given in Section |lll and the 
paper is concluded in Section [Vll 

Notations - Throughout the paper, matrices (vectors) are 
denoted by bold, uppercase (lowercase) letters. The Hermitian 
and Frobenius norm of a matrix X are denoted by and 
||X|| respectively. For a square matrix X, tr{%) denotes the 
trace of X. Unless used as a subscript or to denote indices, 
i represents ^/—\. The set of all real and complex numbers 
are denoted by M and C, respectively. The to x to sized null 
matrix is denoted by 0„j. For any set I, its complement in 
the corresponding universal set is denoted by I'^. 

II. A Brief Review of the Generalized 
Distributive Law 

In Section |lll] we show that the ML decoding of STBCs is 
an instance of a particular class of MPF problems: the MPF 
problems on the min-sum semiring over the real numbers R. 
We now recall the definition of this class of computational 
problems, their GDL solution and some properties of the GDL 
which we use in the later sections. 

A. MPF problems on the min-sum semiring over R 

Consider the union of the set of real numbers R and the 
element infinity, oo. With multiplication defined on this set as 
the sum of two elements, and addition defined as the operation 
of taking the minimum, we get the min-sum semiring over 
R. The elements oo and are the additive and multiplicative 
identities respectively. The class of MPF problems defined on 
this semiring are as follows [30] . Let xi, . . . , xjv be variables 
that take values independently from finite sets .4i,...,.4jv 
respectively. For any I = {ii, . . . , i|x|} C {1, . . . , A^} with 

' - xA|„, 
Let 



ii < 12 < • • • < denote by Ai the set Ai-^ x 



and denote by xj the variable list (x^^ , . . . , Xi|^| 
S = {Ii, dots,lL} be a set of L subsets of {1, . . . , N}, and 
for each £ = 1, . . . , L, let : Ai/, — ?► R be a given function 
i.e., a table of values. Define functions (3 : .4{i ^v} ^ ^ ^nd 
^£ : y^i, ^ R, £ = 1, L, as follows: 

L 

/3(xi, . . . ,Xjv) = ^a£(xij and (1) 



e=i 



mm 



/3(xi, . . . ,xjv), 



(2) 



where ^ denotes addition of real numbers, and 2| is the 
complement of 1^ in {1, . . . , N}. The MPF problem on the 
min-sum semiring over R is to compute the table of values 
of the function for one or more £ — 1, . . . ,L, given the 
functions ai, . . . ,aL- The function (3 is called the global 
kernel and the function is called the x.Xi-marginalization 
of 13 or the objective function at l£. 



B. The Generalized Distributive Law 

The GDL is a message-passing algorithm that operates on a 
simple tree (an undirected, unweighted, connecteqj graph with 
no loops, cycles or multiple edges) Q = (V,f ). Each vertex 
w e V is associated with a function : Ax^, R, for some 
Iv C {1, . . . , N}. The function is called the local kernel 
at V, and the variable list xi^ is called the local domain at v. 
The tree Q can be used to solve the MPF problem given in (|2]i 
using the GDL if it satisfies the following three conditions: 

C. l for each I — 1, . . . , L, there exists a v G V such that 

X( — ly, 

C.l the global kernel (3 — X]£=i (^t = Y^v£V 
C.3 the tree Q satisfies the junction tree condition, i.e., for 
each n = 1, . . . ,N, the subgraph of G consisting of those 
vertices whose local domains contain x„ together with 
the edges connecting these vertices is connected. 
A tree Q that satisfies all the three conditions above is said 
to be a junction tree for the given MPF problem. In general 
there is no unique junction tree for an MPF problem, and 
different junction trees may lead to GDL algorithms with 
varying complexities of implementation. Various methods to 
construct/transform junction trees are given in [301 , 131 1 . 

For any two neighboring vertices u and v, the directed 
message from u to w is a table of values of a function 
IJ-u,v ■ Ax^ni^ ^ To send a message to v, the vertex u 
forms the sum of its local kernel with the messages that it 
has received from all its neighbors other than v, and then 
marginalizes this sum with respect to the variables common 
to u and v, i.e.. 



A^«,t)(xi„ni^) 



mm 



v 



w adj u 



J 



where w adj u denotes that the vertices w and u are neighbors. 
The state of the vertex u is a table of values of a function 
Cti : Ax^, — > R- Initially cr„ is set to be equal to the local kernel 
at u. During the GDL algorithm it is updated as the sum of 
the local kernel at u with the messages that u has received 
from all its neighbors, i.e.. 



^^(xi^) = a„(xi,^ 



w adj u 



In order to solve the all-vertex problem, i.e., to compute 
the xj^, -marginalization of /3 for every w G V, every vertex 
is made to send a message to a neighbor when for the first 
time it receives messages from all its other neighbors. So the 
messages begin at the leaves of the junction tree, proceed 
inwards into the tree and then travel back outwards. At the 
end of this message-passing schedule, each vertex computes its 
state, which is guaranteed to be equal to the objective function 
at that vertex |30|. The objective function (3^ given in (|2|i is 
thus equal to the state of any vertex v with —I^. To solve a 
single-vertex problem, i.e., to compute the xj^, -marginalization 
of (3 for a given vertex v, all the edges of the junction tree 

' A graph is said to be connected if there exists a path between every pair 
of nodes. 
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are directed towards the root v. Every vertex except v sends 
exactly one message to its neighbor along the unique path to 
V when it has received messages from all its other neighbors. 
The state at v is computed once v receives messages from all 
its neighbors, and this equals the objective function at v. 

The total number of additions and pairwise comparisons (for 
implementing min) in the case of single-vertex problem for 
any root vertex v is equal to 



designs S = ■^i-^i' where si, . . . ,sk are real variables 

or information symbols and € C"*^"^ are the weight or 
linear dispersion matrices 1121 . BTll . The rate of the resulting 
code is ^ complex symbols per channel use. Commonly in 
the literature the real variables {s;} are combined pairwise, 
and the design is represented in terms of the resulting complex 
information symbols. Examples include matrix designs whose 
individual entries are complex linear combinations of complex 
variables and their conjugates. 

Let the symbols {si, . . . , sk} be partitioned into N subsets, 
called encoding groups, such that the symbols in different en- 
coding groups are encoded independently and all the symbols 
in each encoding group are encoded jointly. For n = 1, . . . ,N, 
let x„ be the vector consisting of the information symbols 
belonging to the n*'' encoding group, and let x„ be encoded 
using a finite set An C M^", where A„ is the number of real 
symbols in the n*'' encoding group. The STBC obtained from 
the design S and the signal sets Ai, . . . , An is 



uev 

E 



(|-4iJ + |^2 



(3) 



where d„ is the degree of the vertex u. The all-vertex GDL 
schedule can be implemented with complexity of at the most 
4C{Q). The complexity order for both single and all-vertex 
problems is thus max„gv l-^iul- 

The messages passed during the GDL schedule can be 
characterized precisely using the local kernels of Q. In both 
the single and the all-vertex GDL schedules, the directed 
message from a vertex u to its neighbor v is the xj^ni^- 
marginalization of the sum of the local kernels of all the 
vertices descending from u ||3T1 . More formally, consider the 
two disjoint trees Gu\v and Gv\u obtained from Q by removing 
the edge {u,v) € £, such that Gu\v contains the vertex u and 
Gv\u contains v. Then we have 



Mu,-u(xi„nii,) 



mm 

'(lunl„) 



in aw (xi^ 



The GDL algorithm capitalizes on the 'factorization' of /?, 
as given in ([T]), into L functions whose domains are smaller 
than that of /3 itself, and hence are less complex to work with 
compared to /3. During the message-passing, partial sums of 
these 'smaller' functions are calculated, and these are used 
efficiently to compute the various xx^ -marginaUzations of /3. 

III. The GDL Decoding of Space-Time Block Codes 

In this section, we first introduce the notion of encoding 
groups in STBCs obtained from linear designs, and then using 
this concept, formulate the ML decoding of such STBCs as 
an MPF problem over the min-sum semiring over M. We then 
propose a junction tree to decode any STBC obtained from 
linear designs using the GDL message-passing algorithm. 

A. Channel model, designs and encoding groups 

We consider the block fading MIMO channel with full 
channel state information (CSI) at the receiver and no CSI 
at the transmitter. For an rit x n,. MIMO transmission, we 
have 

Y = HX + N, (4) 

where X G C"'^-'" is the codeword matrix transmitted over 
T channel uses, N € £;nrxT ^ complex white Gaussian 
noise matrix whose entries are i.i.d. with zero mean and unit 
variance, and H e C"rxrit ^j^g channel matrix with arbitrary 
probability distribution. An STBC C is a finite set of x T 
complex matrices. We consider codes that are obtained from 



K 



^ An, n = l,...,N 



Example T.l: Consider the Toeplitz code II39I for = 2 
antennas and T — IQ time slots. The number of real symbols 
K — \d> and the design S — 



Sl + jS2 S3 + jS4 S5 + jS6 ' ' ' S17 + jSig 

Si+jS2 S3+jS4 ••• Si5+jSiti S17 + jsig 



Let the complex symbols S2n-i + is2n, n — l,---,9, 
be encoded using a HEX constellation ll42l Ah ex C K^. 
This STBC has 9 encoding groups and the vectors x„, 
n = 1, . . . , 9, are given by x„ — [s2ri-i S2n] ■ The number 
of symbols per each encoding group is A„ = 2 and the finite 
sets An = Ahex for n = 1, . . . , 9. ■ 
A subset of real information symbols {si,...,sk} that 
are encoded together using an arbitrary joint signal set must 
be decoded jointly by an ML decoder. The encoding groups 
xi, . . . , xat are the fundamental units of information variables 
that any ML decoder will operate on. For a given STBC the 
choice of the weight matrices {A^}, encoding groups {x„} 
and the signal sets {An} may not be unique. As illustrated in 
the following example, a careful choice of the weight matrices 
and signal sets can reduce the number of real symbols per 
encoding group. This reduction in encoding complexity may 
get reflected as a reduction in the ML decoding complexity at 
the receiver. 

Example G.l: Consider the Dayal-Varanasi version of the 
Golden Code L28J: 



Sl = 



Sl+jS2 7(s5+iS6) 



where 7 — \J~^ and the symbol vectors 
[si + js2 S3 + js4] and [35 + jsg s-, + jss] are 
encoded independently using a constellation from the rotated 
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lattice RLliY with 



R 



c s 
-s c 



ftan-\2)\ 
, c — cos I I and 

tan-^{2) 



A naive choice for the symbol groups is 



Xl = [si S2 S3 S4]^,X2=[S5 Sq S7 Ss] ^ ■ 



The corresponding weight matrices are 



Ai = 

A7 = 



1 



0" 

J. 

0' 

7 



,A2- 
,A5 = 

and Ag = 



"j 0" 



,A3- 


"0 7" 



, Ae = 





1 

jj 





0' 

31 



It is shown in Example G.4 of Section [V-CI that this choice of 
encoding groups leads to GDL based decoders with complex- 
ity equal to that of brute-force ML decoding. A better choice 
of weight matrices and encoding groups can be obtained by a 
simple linear transformation of the symbols {si}. The resulting 
design S2 is given in (|5]l at the top of the next page. The 
symbols {si} of this new design are encoded independently 
of each other using a PAM constellation. Both Si and S2 
give the same STBC though they are encoded using different 
sets of weight matrices and constellations. The number of 
encoding groups in S2 is 8, and each symbol Si forms an 
encoding group by itself, i.e., x„ = [s„], n = 1, . . . ,8. The 
corresponding weight matrices are 



Ai = 


c 



" 

— s 


,A2 = 


jc 



" 

-js 


,A3 




s 0" 
c 


A4 = 


js 



0' 

jc 


,A5 = 


7c 
—7s 


,Ae 







Aj = 


'0 

7c 


7s 



and A 


s — 


■ 

jic 


jjs 








J7C 




This choice of encoding groups leads to reduced com- 
plexity ML decoding as will be shown in Exam- 
ple G.4. ■ 



B. The GDL Decoding of STBC s 

Given the Ur x T received matrix Y in (|4|l, the ML de- 
coder finds the set of variables {si, . . . , sk} that minimizes 
||Y — H J2i^=i SiAilp. The ML decoding problem is to find 



argminir ({Y - ^ s,HAO(Y^ - ^ s,Af H^) J 

\ i=l 4=1 / 

K 

-. argmintr(YY^) + ^ s^tr {-HA.Y" - YAfu") 



1=1 



K 



+ > s2ir(HA,Af H«) 



1=1 

K 



+ E E S4S,ir(H(A,Af + A, Af )H^) 

1=1 j>i 

= argmin/(si, ...,sk), 
where tr{-) is the trace of a square matrix, and 



f{si,...,SK) 



K 

^ ^j ySj^i + S^^i^i) + ^ ^ SiSj£,iJ, 
j=l j>i 

- tr{-nA,Y" - YAf H^), 

tr{U{A,Af + AjAf )H^) for j > i, and 
e,,, = tr(HA,Af H^). 

Since the matrices HAjY^ + YAf H^, HA^Af and 
H(Ai A|^ + AjAf^)H^ are Hermitian, the coefficients ^i, 
S,i,i7 are all real. 

The function /(si, . . . , sk) is a second degree polynomial 
over R. We now partition the terms of this polynomial accord- 
ing to the encoding groups {x„}. The terms in / that consist 
of variables only from the n*'' encoding group are summed 
together into the function Q!„(x„). For n < m, those terms in 
/ that contain exactly one variable each from the n*'' and the 
TO*'' encoding groups are summed together to get the function 
o^n.m (x„, Xm). For n = 1 , . . . , A^, let ij;{n) denote the set of 
indices of those real symbols Si that are in the n*'* encoding 
group x„. Then for n = 1, . . . , iV, we have 

a™(xn) = E {st^t + sUt,t) + E 

ieip{n) j>i 

and for all 1 < n < m < we have 



i(Xn;X^) — ^ ^ SiSj^i 



(6) 



j6V(") 



Define 



AT 



^(xi,...,XAr) = ^a„(x„) + ^ Q^n,m(Xn5 X^i). (7) 
n—1 m>n 

By definition, I3{xi, . . . — f{si, . . . , sk) and the ML 
solution is (xi, . . . jX^r) — argmin/3(xi, . . . ,xjv). If the ML 
solution is unique then for each n = l,...,N, we have 
x„ = argmin/3„(x„) where 

/3„(x„) = min ^(xi, . . . , xa^). (8) 

The definition of /3 in (|7J provides a natural 'factorization' 
of the global kernel in terms of the functions a„ and an,m 
whose domains are much smaller than that of (3, and hence 



S2 = 



SlC + S3S + jS2C + jSiS 

7(-S5S + S7C - jses + jssc) 



7(s5C + S7S + jsec + jsss) 

— SiS + S3C — jS2S + jS4C 



(5) 



are easier to compute. From (|2]l and ([S), we see that the ML 
decoding of an STBC is an MPF problem, and hence it can be 
solved using the GDL which efficiently processes the partial 
sums of ttn, an,m to computc the x„-marginalizations of /3. 
The ML solution for x„ can be obtained by first computing 
the x„-marginalization of the global kernel /3 in (O and then 
finding the argument x„ that minimizes /3„. 

When the ML solution is not unique an arbitration is 
required after solving the MPF problem. To illustrate this, 
consider the case N — 2 and say both (xi,X2) — (ai,a2) 
and (xi, X2) = (bi, are ML solutions. On solving the 
MPF problem (|8]i we would obtain a table of values for 
the functions /3i(xi) and /32(x2). However, both ai and 
a2 minimize (3i, and both bi and b2 minimize f32- Thus 
we only know that the ML solutions belong to the set 
{(ai,a2), (ai,b2), (bi,a2), (bi,b2)}. In order to obtain the 
ML solutions, the ML metric ||Y — HX|p for each of these 
tuples should be calculated. The following lemma says that 
for an i.i.d. Rayleigh fading channel the ML solution of an 
STBC is unique with probability 1, and hence this arbitration 
step can be safely ignored. 

Lemma 1: Let C be any STBC, and let the entries of the 
channel matrix H be i.i.d. complex Gaussian random variables 
with zero mean and unit variance. Then with probability 1 the 
ML solution for the transmitted codeword for the channel (|4]i 
is unique. 

Proof: Let Xi and X2 be two distinct codewords. We 
will first show that with probability (w.p.) 1 HXi 7^ HX2, 
and then show that given HXi 7^ HX2 the probability that 
both Xi and X2 are ML solutions is 0. Since Xi 7^ X2, there 
exists a column of (Xi — X2) which is non-zero. Suppose the 
j*'' column of (Xi — X2) is non-zero, the (1, j)*'' entry of the 
matrix H(Xi — X2) is a complex Gaussian random variable 
with zero mean and non-zero variance. Then the (1, j)*'^ entry 
of H(Xi — X2) is non-zero w.p. 1 and hence HXi 7^ HX2 
w.p. I. 

Now suppose Xq is the transmitted codeword and H is 
such that HXi 7^ HX2. Let uec(-) denote the vectorization 
of a matrix. Then vec (H(Xo - Xi)) ^ vec (H(Xo - X2)). 
Both Xi and X2 will be ML solutions only if the 
ripT-dimensional white Gaussian noise vector uec(N) be- 
longs to the the set of points in C"' that are equidis- 
tant from wee (H(Xo — Xi)) and wee(H(Xo — X2)). Since 
wee (H(Xo — Xi)) 7^ uee(H(Xo — X2)), this set is a coset 
of an [urT — 1) -dimensional subspace of C"''^ and the prob- 
ability that ?;ee(N) belongs to this hyperplane is 0. This 
completes the proof. ■ 

A junction tree to solve the MPF problem ^ is shown in 
Fig. [T] The tree can be viewed as consisting of three sections. 
At the center of the tree is the core consisting of only the 
(xi, . . . , xjv) vertex. The core is surrounded by tier 1: a layer 
of (x„,Xm) vertices, each of which is connected to the core 
vertex by a single edge. Outermost is tier 2: a layer of x„ 




Fig. 1. A junction tree to decode an ai'bitrary STBC. 




Fig. 2. Subtree formed by the vertices tliat contain X2. 

vertices, each of which is connected to a vertex from tier 1 
by a single edge. The local kernel at the core is set identically 
equal to zero, the local kernels at the (x„, x,,, ) and x„ vertices 
are set to a„_,„ and an respectively. This tree satisfies all the 
three conditions C.1-C.3 (given in Section Hl-Bb for it to be a 
junction tree for the MPF problem of ML decoding the STBC 
C. Conditions C.l and C.2 are easy to check. To illustrate 
the satisfiability of C.3 (the junction tree condition). Fig. |2] 
shows the subtree formed by the vertices whose local domains 
contain the symbol X2. Clearly this subtree is a connected 
graph. 

IV. Fast GDL Decodable Space-Time Block Codes 
The junction tree of Fig. [T]has complexity order 

max\AxJ = jl = |C|, 

which is equal to the complexity order of brute-force ML 
decoding. There exist codes whose weight matrices {A^} are 
such that the function Qf„ „i is identically equal to zero for all 
channel realizations H for certain pairs {n,m). In such cases 
a number of 'factors' in the MPF formulation in d?) can be 
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dropped, and this can lead to junction trees whose order of 
complexity is less than \C\. 

Definition 1: If an STBC C admits GDL decoding with 
complexity order less than \C\ then we say that it is fast GDL 
decodable. 

A number of properties of the GDL decoding of an STBC 
can be readily inferred from what are known as the moral 
graph of an STBC and the core of a junction tree. In the 
following subsection we introduce these notions, and in Sec- 
tion IIV-BI we give some results on the fast GDL decodability 
of STBCs based on these concepts. 

A. The Moral Graph and the Core 

The local kernels a„^m(x„,x,„) arise from the cross terms 
SiSjii.j (HJ, where = tr(H(AiA^ + A^- Af )H^). It is 
well known 19|-[ir| that a necessary and sufficient condition 
for j = for any channel realization H is that A^ and Aj 
be Hurwitz-Radon orthogonal, i.e., A^Aj^ + A^A^ = 0„j. 
We say that two variables x„ and x™ interfere with each 
other if there exists a symbol s,; in the encoding group 
x„ and a symbol Sj in the encoding group x,„ such that 
AiA^ + AjAf Orif If no such symbols Si, sj exist we 
say that x„ and x,„ are non-interfering. The local kernel 
an.m(x,i, x„j) is identically zero (and hence can be removed 
in the MPF formulation) for all channel realizations if and 
only if x„ and x^ are non- interfering. The moral graph |f30) 
of the MPF formulation of ML decoding an STBC is a simplq3 
graph whose vertices are the variables x„, n — 1, . . . ,N, and 
in which an edge exists between two vertices if and only if 
the two corresponding variables are interfering. 

In the MPF formulation in O the kernels a„(x„) arise 
from the terms S^iSi and £,i,iS^. Recall that = ||HA,H|. 
and hence is non-zero with probability 1. Thus, the kernels 
On, n = I, . . . ,N, are almost always non-zero and can not 
be removed from the MPF formulation. On the other hand, as 
we saw in the previous paragraph, some of the cross terms 
a„.m can be made identically zero. This information about 
the cross terms is embedded in the moral graph of the code. 
Thus, all the information required to construct a junction tree 
for a code is contained in its moral graph. We now show how 
the problem of constructing a junction tree can be reduced to 
the construction of what we refer to as the core. Let T be a 
simple tree such that each vertex w of T is associated with a 
variable list xj^ (for some I„ C {!,..., N}) and the kernel 

av{x rnathcall^) — 0. 

Definition 2: The tree T is said to be a core for the STBC 
C if (i) it satisfies the junction tree condition (condition C.3 of 
Section III-Bb . and ( ii) for every pair of neighboring vertices 
(x„, Xm) in the moral graph, there exists a vertex v of T such 
that {x„, 

} ^ ^i.. ■ 

Given a core T, a junction tree for the STBC can be 
constructed as follows. For every pair (x„, x^) of neighboring 
vertices in the moral graph, choose a vertex v of T such that 
{x„,Xm} C xx„. If I„ = {n, m} then set the local kernel 
at V to an,m, else attach a vertex (x„,Xm) with local kernel 

graph is said to be simple if it is undirected, unweiglited with no loops 
or multiple edges. 




Fig. 3. Moral graph of Example [T] 




Fig. 4. The core of Example [T] 

ctn.m to V using a single edge. The set of (x„,x,„) vertices 
thus added to T form tier 1 . Now, for each n = 1, . . . ,N, 
find a vertex of tier 1 that contains the variable x„ and attach 
the vertex (x„) with the local kernel a„ to that vertex using 
a single edge. If there exists no tier 1 vertex that contains 
x„ then connect the (x„) vertex with local kernel a„ to any 
vertex of tier 1 using a single edge. The set of (x„) vertices 
thus added form tier 2. It is straightforward to show that the 
graph thus obtained is a junction tree for the STBC C. 

Example 1: Consider a code with iV = 5 
encoding groups and moral graph as shown in 
Fig. [3] There are five pairs of interfering symbols 
{(xi,X2), (xi,X3), (X2,X3), (x2,X4), (X3,X4)}. A core 
for this code is shown in Fig. |4] The core together with the 
tier 1 vertices is shown in Fig. |5] Note that the (x2, X3) vertex 
of tier 1 could have been connected to the bottom vertex 
of the core as well. The complete junction tree is shown in 
Fig- in The vertex (xs) has been connected to an arbitrarily 
chosen tier 1 vertex. The complexity order of this junction 
tree is max{|^{i 2,3}|5 1-4{2.3,4}|} < and hence this code 
is fast GDL decodable. ■ 

Given the moral graph of an STBC, the problem of con- 
structing a junction tree is equivalent to the problem of 
constructing a core. There is no unique core for a given 
STBC/moral graph, and different cores can lead to junction 
trees with different complexities. For instance, the graph 
with the single vertex (xi, X2, . . . , xat) can always be used 
as a core irrespective of the structure of the moral graph 
(see Fig. [T]i. However this would lead to junction trees with 
complexity order \A-^i atjI = \C\, which is equal to the order 
of brute-force ML decoding complexity. 

When the moral graph is not edgeless, i.e., when there is 
at least one pair of interfering symbols, the complexity order 
of the junction tree is determined by the core vertices. Since 
every pair of interfering vertices must be contained within 
some 'larger' vertex of the core, the vertex v of the junction 
tree with the largest \Ax^ \ belongs to the core. Thus, given 
an STBC/moral graph, the problem of finding an efficient ML 
decoder is equivalent to one of constructing a core with the 
least complexity. 
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Fig. 5. The core T of Example [T] with tier 1 vertices. 




Fig. 6. The junction tree of Example [T] 



When the moral graph is edgeless, i.e., when none of the 
symbols are interfering with each other, any tree Q with N 
vertices can be transformed into a junction tree by labeling the 
N vertices with the local domains (x„) and the local kernels 
a„, n = 1, . . . ,N respectively. Since there are no cross terms 
ttn.m in the MPF formulation, the ML metric 

N 

f{si, . . . ,sk) an(x„) = (3. 

Since every variable x„ appears in exactly one of the vertices 
of G, the tree G satisfies the junction tree condition as well. 
Hence G is a junction tree for the given STBC. The complexity 
order of this junction tree is max^^j^ \An\ < \C\. Thus, STBCs 
with edgeless moral graphs are fast GDL decodable. 

Example 2: All Orthogonal Designs [1] have edgeless 
moral graphs. For example, consider the Alamouti Code 

Sl+jS2 -S3+jS4 
S3 + jSi Si - jS2 

where the real symbols si, . . . ,34 are encoded independently 
using a PAM constellation. This code has = 4 encoding 
groups x„ — [sn], n = 1, . . . , 4. The moral graph, see Fig. |7] 
is edgeless. A junction tree for the Alamouti code is shown 
in Fig. ID ■ 



Fig. 7. Moral graph of the Alamouti Code. 




^^^J 

Fig. 8. A junction tree for the Alamouti Code. 

B. Fast GDL Decodable STBCs 

We now give a sufficient condition for a code to admit fast 
GDL decoding. 

Lemma 2: A code admits fast GDL decoding if its moral 
graph is not complet^ 

Proof: We prove the claim by constructing a core for such 
a code C with complexity order less than \C \ . Since the moral 
graph is not complete, there exist a pair of variables, say xi 
and X2, that are not connected by an edge in the moral graph. 
Consider the tree shown in Fig.|9l There are {N — 1) variables 
in either of the vertices of this tree. It is straightforward to 
show that this tree satisfies both the conditions of Definition |2] 
to be a core for the given STBC. The order of GDL decoding 
complexity with this core is 



3A,...,N 



2,3, 



.N 



}|} < \A{1,2,3, 



,N}\ 



n 



and hence this code is fast GDL decodable. ■ 
Example T.2: Continuing with Example T.l, the moral 
graph of the 2 x 10 Toeplitz code is given in Fig [TO] The 
moral graph is not complete and hence this code admits fast 
GDL decoding. ■ 
Example G.2: We now continue with Example G.l. First 
consider the naive choice of encoding groups with just two 
symbol groups. Since Ai + AsA^ 7^ O2, the two symbol 
groups interfere and hence the moral graph is complete. 
Now consider the second choice of weight matrices and 
encoding groups with 8 symbol groups. The moral graph, 
shown in Fig. [TT] is not complete and hence with this 
choice of weight matrices the Golden code admits fast GDL 
decoding. ■ 
Multigroup Decodable STBCs: Let ^ be a junction tree 
for an STBC C, and let there be {g — 1) edges {uk,Vk), 
k — 1, . . . , {g ~ 1), of G such that n ly,. = (j>, the empty 
set. Let Gi, ■ ■ ■ ,Gg, be the g disjoint subtrees of G obtained 
by removing these {g — 1) edges. Also, denote by x{Gk) the 
union of the set of variables that appear in the local domains 
of Gk- 

Theorem 1: For G, Gi, ■ ■ ■ ,Gg described as above, we have: 
1) x{Gi), ■ ■ ■ ,x(C/g) is a partition of {xi, . . . ,xjv}, 

^A simple graph is said to be complete if every pair of distinct vertices is 
connected by an edge. 



Fig. 9. The core used in the proof of Lemma |2] 




Fig. 10. Moral graph of the ToepUtz code in Example T. 1. 



2) for k = 1,. . . ,g, the tree Qk satisfies the junction tree 
condition, and 

3) for each k = l,...,g, the ML solution of x{Qk) 
can be obtained by running the GDL message-passing 
algorithm on Qk- 

Proof: The proof is given in Appendix [A] ■ 
We say that Gi, ■ ■ ■ ,Gg is a partition of the junction tree 
Q, and that the STBC is GDL decodable using these g 
independent junction trees. Each subtree Qk is composed only 
of a specific subset x{Qk) of variables, hence for any vertex 
Vk of Qk we have ly^, C {1,...,A^}. Thus, the complexity 
order of Q is 

maxl^x I = max maxl^j I < \C\. 

veg ' ke{i,...,g}vteg' "'^ ' 

Thus, codes whose junction trees can be partitioned into two 
or more subtrees are fast GDL decodable. 

Example 3: Consider the junction tree of Example [T] shown 
in Fig. |6l Among the 11 edges of this tree, the edge 
{u,v) between the nodes (x2,X4) and (X5) is the only one 
such that luOXy = (p. Thus, in this case g — 2 and the 
two subtrees are shown in Fig. [12] The sets of variables 
yi{Qi) = {xi,X2,X3,X4} and x(tJ2) = {xs}. The ML solu- 
tions of x(tJi) and yi{Q2) can be obtained by running the 
GDL independently on Qi and Q2 respectively. Note that the 
corresponding moral graph, shown in Fig. [3] is a disjoint union 
of (7 = 2 subgraphs. Further, the first subgraph is composed 
of variables from the set x(CJi) and the second from the set 

X(e2). ■ 

Example 4: All the three edges of the junction tree of 
the Alamouti code, shown in Fig. |8] satisfy the condi- 
tion XuC\Xy — (f). In this case g = 4, and the fc*'' subtree 
Qk consists of a single vertex (xfc) with the local kernel 
Q!/c(xfc). Note that the moral graph of this code, shown in 
Fig. |7] is disjoint union of g = 4 subgraphs, and the k*^ 
subgraph of the moral graph is composed of variables from 

A5k)- ■ 
We will see in Lemmas [3] and |4] that the property of a 
junction tree to be partitioned into several smaller junction 
trees is related to multigroup decodability of a code, and as 
illustrated in the previous two examples, this property can be 
readily inferred from the moral graph. An STBC is said to 
be multigroup or g- group decodable 191- 01 II if {xi, . . . , xjv} 
can be partitioned into g subsets such that each subset of 
symbols can be ML decoded independently of other subsets. 
If the code generated by the /c*'* group of symbols is Ck, then 
the k*^ symbol group is ML decoded by the CML algorithm 



Fig. 11. Moral graph of the Golden Code. 




Fig. 12. The subtrees Qi and G2 of Example [5] 

independent of other symbol groups as 
arg min ||Y-HXfc| 



Cg are 



Thus, in order to decode C, the g subcodes Ci, 
decoded independently by the CML decoder A necessary and 
sufficient condition for g-group decodability is that the weight 
matrices of the variables belonging to different subsets be 
Hurwitz-Radon orthogonal [9J-| 1 1 1. In terms of the GDL for- 
mulation, this translates to the variables belonging to different 
subsets being non-interfering. 

Lemma 3: An STBC is g-group decodable if and only if its 
moral graph is a disjoint union of g subgraphs. 

Proof: The proof is straight forward. ■ 

Using this lemma we see that any code with the moral graph 
of Fig. [3] is 2-group decodable, and that the Alamouti code is 
4-group decodable. 

Lemma 4: An STBC can be GDL decoded using a disjoint 
of union g junction trees if and only if it is g-group decodable. 

Proof: Suppose an STBC has a junction tree that can 
be be partitioned into g subtrees Qi,...,Qg. From Theo- 
rem [T] x{Qi), . . . ,x{Qg) form a partition of the variables 
{xi, . . . , xat}. Consider any two variables x„ and x^ be- 
longing to distinct partitions. From Theorem [T] there exists 
no vertex in Q whose local domain contains both x„ and x„i. 
Thus, the global kernel does not involve the function and 
hence x„ and x„i are non-interfering. We have thus shown that 
the variables belonging to the g subsets x(^i), . . . , x(t/g) are 
mutually non-interfering. Hence, the moral graph is a disjoint 
union of g-subgraphs, and from Lemma [3] the code is 5-group 
decodable. 
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Suppose an STBC is g-group decodable. Then from 
Lemma |3j its moral graph is a disjoint union of g subgraphs. 
For k — 1, . . . , g, let Ffc C {1, . . . , N} be the set of indices of 
the variables in the k*^ disjoint subgraph of the moral graph. 
One can then construct the k*^ disjoint subtree Qk of the 
junction tree Q similar to the construction in Section lTlI-BK see 
Fig. [TJ. The central node of Qk consists of all the variables 
x„, n e Ffc. The domains (x„,x„i) and x„, for n,m E Tk 
are then attached in two tiers, similar to the tree in Fig. [T] 
The junction tree Q is obtained by arbitrarily connecting these 
g subtrees using {g — 1) edges. It is straightforward to see 
that the resulting tree is a junction tree for the code, and 
that Gi, ■ ■ ■ ,Gg form a partition of Q. Hence from Theorem [T] 
the code can be GDL decoded using a partition of g disjoint 
junction trees. ■ 

When a code is g-group decodable, the fc*'* subcode is 
generated by the variables associated with the k*^ disjoint 
subgraph of the moral graph. A junction tree partition for this 
code can be obtained by constructing g junction trees, one 
each for the g subgraphs of the moral graph. 

Fast-Decodable STBCs: An STBC is said to be fast- 
decodable or conditionally g-group decodable M24I if 
there exists a subset F C {1, . . . , iV}, such that the code 
generated by the variables x„, n g F is g-group decodable. 
The CML decoding algorithm to decode such a code proceeds 
as follows. For each of the |^r<:| values that the variables 
xr<: jointly assume, the conditionally optimal values of the 
remaining variables x„, n e F can be found out via g- 
group decoding. Note that each of these g subcodes can 
themselves be fast-decodable (such codes are said to be fast- 
group-decodable (431). From among these |.4r<:| values of 
^^{1,...,^}' the realization of X{i ^v} that minimizes the ML 
metric ||Y — HX|||n is found out in a brute-force way. Let 
the g subcodes correspond to the variables with index sets 
Fi, . . . , Fg and let the complexity order of decoding the /c*'* 
subcode using CML be Ok- For each k = l,...,g, the 
complexity order Ok < l-^rj.]. The complexity order of the 
CML algorithm is then 



lAr-l max Ok < Wr<= 

ke{l.....,g} 



max l^ri I 

fee{l,...,g} ' 



<\C\. 



Lemma 5: An STBC is conditionally g-group decodable if 
and only if there exists aFC{l,...,A^} such that the moral 
graph of the reduced set of variables {x„|n e F} is a disjoint 
union of g subgraphs. 

Proof: Follows immediately from Lemma [3] ■ 

From Lemmas |2] and |5] we see that conditionally g-group 
ML decodable codes admit fast GDL decoding. 

Example T.3: Consider the Toeplitz code of Example T.2. 
With F = {l,...,9}\{5}we see that the moral graph gen- 
erated by xr is a disjoint union of 2 subgraphs (see Fig. [Tsl l. 
The first subgraph consists of the symbols xi , . . . , X4 and the 
second subgraph consists of X6,...,xg. Hence this code is 
conditionally 2-group decodable. Note that the code generated 
by the variables xi , . . . , X4 is itself conditionally 2-group de- 
codable where the two conditional groups are {xi} and {X4}. 
Similarly the code generated by xg, . . . , Xg is conditionally 2- 
group decodable as well. ■ 









Fig. 13. Toeplitz code: Moral graph of the reduced set of variables xr- 







Fig. 14. Golden code: Moral graph of the reduced set of variables xp. 



Example G.3: Consider the moral graph of the Golden code 
given in Fig. [TT] For F = {1,2,3, 4}, the moral graph gener- 
ated by the variables {xi ,...5X4}, shown in Fig [14] is a dis- 
joint union of 2 subgraphs. The first subgraph consists of vari- 
ables xi , X3 and the second subgraph consists of the variables 
X2,X4. Thus the Golden code is conditionally 2-group decod- 
able. This fast-decodability property of the Golden code was 
first reported in HS) , lH . ■ 

V. GDL IS Faster than Conditional ML Decoding 

In this section we show that the number of computations 
involved in the GDL decoding of any STBC is less than that of 
CML decoding. As a first step towards this, we show that ML 
solutions can be obtained using only the single-vertex GDL 
algorithm followed by a 'traceback', rather than the more com- 
plex all-vertex GDL. This reduction is possible since we are 
only interested in the arg min of the objective functions at the 
various vertices, and not the objective functions themselves. 

A. Traceback 

Let Q be any junction tree for the STBC C with the encoding 
groups Xi, . . . , xjv. We will now show that the ML solutions 
of {x„} can be obtained by running the single-vertex GDL 
with any vertex vq as the root, followed by a traceback step. 
This is similar to the Viterbi's algorithm ||32]| . where the actual 
ML metric of only the last state of the trellis is calculated and 
then the ML path is traced back to the first state. 

Consider the single-vertex GDL message-passing schedule 
with Vq as the root. Every vertex u ^ v sends a message to its 
neighbor p{u) on the unique path from u to vq, when it has 
received messages from all its other neighbors. While doing 
so it computes its partial state 



A„(xi„) a„(xi„) 



E 

w adj u 



and sends the message fJ,u,p(u) as 



A'«,p(«)(xi„nip(„)) = min A„(xx„). 



Note that this partial state is different from the state cr„ 
of u at the end of the all-vertex GDL algorithm. These two 
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functions are related as 

o-„(xi„) = A„(xi„) + (xip(„,ni„), 

where ^p(u).u is the message from p{u) to u during the all- 
vertex GDL. However, the message ^p[u),u is not generated 
during the single-vertex schedule. At the end of the single- 
vertex GDL, calculates its state a^^, which is equal to 
the xj^,^ -marginalization of /3. The ML solution to xj^^ is 
obtained as xj^^ = argmin(Ti,o(xi^^). 

Let u be any vertex such that the ML solution of the 
local domain of p{u), i.e., xj^^^j is known. Partition xj^^ into 
^A{u) = and Xb(„) = xi„nip(„,- Note that both A„ 

and Gu are functions of both 'x.a{u) and x^j^j. Since the ML 
solution at p{u) is known, the value xb(k) that minimizes 
(T„(x^(„)), xb(u)) is known. Thus, the ML solution of x^(u) 
is 

x^(«) = arg min cr„(x^(„), xb(„)) 

XA(u) 

= arg min Ai,(x^(„),xs(„)) + Mp(«).«(xb(„)) 
= arg min A„(xa(„),Xs(„)). 

Hence, the ML solution at u can be obtained merely from 
A„ and the ML solution at p(u). This is possible since we 
are only interested in arg min (t„ rather than CTu itself, and as 
shown above, arg min cr„ can be obtained from A^ without 
calculating (t„ explicitly. At the end of the single-vertex 
schedule, the solution at is first found, followed by all its 
neighbors, and then the neighbors of these vertices, and so on, 
until the ML solution of all the variables x„, n = 1, . . . , iV, 
are obtained. Since the all-vertex GDL is about four times as 
complex as the single-vertex GDL, this traceback algorithm 
provides a considerable reduction in complexity. 

Example 5: The direction of messages for the single-vertex 
GDL problem on the subgraph Qi of Example [3] with root 
at the vertex (xi,X3) is shown in Fig. [15] In this ex- 
ample, p{h) = p{c) = a, p{d) = p{e) = p{g) = c, p{f) ^ e, 
p{h) = p{i) = g, p{u) = h and p{v) — i. At the end of the 
GDL schedule the state at the vertex a is equal to the 
(xi, X3)-marginalization of the global kernel. The optimal 
(xi,X3) is found out from cTq using (|yli||yl3| — 1) pairwise 
comparisons. Since p{c) — a, using the knowledge of xi,X3 
and Ac, the value of X2 can then be found out. This step 
involves (|yl2| — 1) comparisons. Finally, given X2,X3 and 
Ag the value of X4 can be obtained using {{Ail — 1) com- 
parisons. If |.4i| = • • • = 1.44 1 = q, then finding the optimal 
x„, n = 1, . . . , 4, using the single-vertex GDL and traceback 
involves 7q'^ + Aq^ + 2(7 — 3 operations. On the other hand, 
using the all-vertex GDL would cost 2Sq^ + I2q^ +Aq-l 
operations. Comparing the leading order terms, we see that, 
traceback has enabled us to reduce the complexity by about 4 
times. ■ 

B. GDL is faster than Conditional ML decoding 

Before stating the results of this subsection, we define 
the GDL and conditional ML decoding complexities of an 
STBC, denoted by Cgdl(C) and Ccml(C) respectively. The 




Fig. 15. Direction of messages for the single-vertex GDL for root vertex a. 



GDL algorithm varies with the choice of the weight matri- 
ces, encoding groups and the junction tree. By Cgdl(C) is 
meant the minimum among the complexities (the number of 
mathematical operations: multiplications, additions and com- 
parisons) of all possible GDL algorithms that can be used to 
solve the ML decoding problem of C. Similarly for the CML 
algorithm there can be more than one choice of reduced set 
of variables xr which generate a multigroup decodable code. 
The complexity of conditional ML decoding then varies with 
this choice. By Ccml(C) is meant the minimum among all 
possible conditional ML decoding complexities of code C. By 
Cgdl(C) and Ocm\_{C) we denote the order of Cgdl(C) and 
Ccml(C) in terms of the signal set/constellation size. 

Order of decoding complexity: 

We now show that the order of GDL complexity of any 
code is upper bounded by the order its CML complexity. 

Theorem 2: For any code C, C'gdl(C) < C'cml(C). 
Proof: Proof is given in Appendix |B] ■ 

The following example shows that there exist codes for 
which the GDL complexity order is strictly less. Thus the 
CML decoding algorithm is in general suboptimal in terms of 
reducing the ML decoding complexity. 

Example T.4: The 2 x 10 Toeplitz code can be decoded 
using the junction tree given in Fig. [16] at the top of the 
next page. If the size of the complex HEX constellation used 
to encode the variables x„ = [s2n-i S2n] is M then the 
complexity order of this junction tree is |y^{„.„-i}| — M"^. 
The least complex CML algorithm proceeds as follows. The 
variables {xi, . . . , X4} and {xg, . . . , xg} are independently 
decoded after conditioning on X5. To decode {xi,...,X4}, 
one first conditions on {x2,X3} and finds the conditionally 
optimal values of Xi and X4 independently. The decoding 
of {xg,...,X9} proceeds in a similar way. Thus the CML 
complexity order is M'^. On the other hand, the brute- 
force decoding complexity, \C\ = M'^ . Hence, for this code 

OgDL < OCML < |C|. ■ 

We now give two examples of families of STBCs for which 
Ogdl < OcUL- 
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Fig. 16. Junction tree to decode the 2 X 10 Toeplitz code 




Fig. 17. The moral graph of the 4 X 14 OAC. 

1) Toeplitz Codes [39]: Consider a 2 x T Toeplitz Code, 
T > 2. This code consists of K = 2{T — 1) real symbols. We 
can construct a junction tree for this code similar to the one 
in Example T.4. The chain in this junction tree would extend 
till the (xt-2,xt-i) vertex. The complexity order of this 
junction tree is still AP, irrespective of the value of T, where 
M is the size of the complex constellation used to encode the 
symbols x„. The best ordering for conditional ML decoding 
this code is to first condition on the variable x^t-i^ This 
would result in two conditional ML decoding groups each 
of which generates a 'shorter' Toeplitz code whose delay is 
approximately ^. Thus the CML decoding complexity grows 
with M and T as AP"^^'^. It is interesting that though there 
is interference among the symbols, the GDL complexity is a 
constant independent of the number of symbols encoded by the 
code. These results can be extended to nt > 2. For any rifXT 
Toeplitz code there exists a junction tree whose complexity 
order is M"' . The CML decoding complexity however grows 
with the delay T. 

2) Overlapped Alamouti Codes (OACs) HWll : These codes 
are 2-group ML decodable and are available for all choices 
of T > nt > 2. They can be GDL decoded with complexity 
order ML~^J. The CML decoding complexity on the other 
hand grows with the number of symbols or equivalently with 
the delay T. For example, for nt = 4, the CML complexity 
grows as Mr'°S2(2-)l. As an example we construct a junction 
tree for the 4 x 14 OAC and show that its complexity order 
less than the CML decoding complexity. 

The 4 X 14 OAC consists of 24 real symbols 
si, . . . , S24- Define the auxiliary variables zi, . . . , z\2 as 
^ri = S2ri-i + js2n. The design in terms of these auxiliary 
variables is given in (|9]l at the top of the next page. The 
variables z„, n= 1, . . . , 12, are encoded independently using 
a complex constellation of size M. Choose the encoding 
groups as x„ — [s2n-i S2n] for n = 1, . . . , 12. The moral 
graph for the code is given in Fig. [17] The moral graph is 
not complete and hence from Lemma |2] this code admits 
fast GDL decoding. Since the moral graph is a disjoint 
union of two subgraphs, from Lemma [3] this code is 2-group 




Fig. 18. A junction tree partition to decode the 4 X 14 OAC. 

decodable. A junction tree partition to decode this code 
is shown in Fig. [18] Note that this partition consists of 2 
subtrees, each of which is a junction tree for the subcode 
generated by the 2 ML decoding groups. The complexity 
order of this junction tree partition is Af^. When CML 
decoding is used, the least achievable complexity order is 
. We explain the CML decoding for the first ML decoding 
group. The decoding of the second group is similar. On fixing 
the value of X5, we get two conditional decoding groups. The 
first group {xi,X3} is jointly decoded with complexity AP' 
for each value of X5. The second group, {x7,X9,Xii}, is 
again conditionally 2-group decoded with the two conditional 
groups being {xy} and {xn}. 

Exact decoding complexity: 

Almost all STBCs of interest have the property that each 
encoding group has the same number of real symbols, say 
t, and the signal set size of all the groups are equal, i.e., 
|-4i| = I.42I = • • • = |.4Ar|. If the average number of infor- 
mation bits carried by each real symbol is log2 q then the 
signal set size \An\ = q*- For example, when t = 2 the real 
symbols {si} are encoded pairwise, and is the size of 
the complex constellation used to encode each x„. For the 
sake of analytical tractability, and considering the widespread 
prevalence STBCs of this type in the literature, we restrict our 
analysis of the exact GDL and CML complexities to codes 
wherein the number of real symbols in each encoding group 
is the same and \An\ = q*- 

Let C be any code where all the symbols x„, n = 1, . . . , A^, 
are mutually interfering. We will refer to such codes as being 
fully-interfering. In Appendix |C] we compute the exact CML 
and GDL complexities of such a fully-interfering STBC. The 
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CML algorithm performs a brute-force minimization of the 
ML metric over all g^* values of (si, . . . , sjvt)- complexity 
is 

CcML(C)=g^*(^3(^^^^ +5iVt^ -1. (10) 

To GDL decode this STBC, we use the junction tree of Fig. [T] 
in Section IIII-BI We employ a single-vertex GDL schedule 
with the root at any one of the (x„,x,„) vertices followed 
by traceback (using the core vertex as the root will contribute 
to the leading order term q^*", which is avoided here). The 
complexity of this GDL decoder is given in ( fTTT i at the 
top of the next page. Comparing the leading terms of JTOl i 
and ( fTTT i. we see that when the real symbols {si] are encoded 
independently of each other i.e., when t — \, the GDL is about 
3 times less complex as the CML. When the symbols are 
encoded pairwise using a complex constellation, i.e., when 
t = 2, the GDL is approximately 12 times less complex than 
the CML decoder. For example, for any STBC obtained 
from Cyclic Division Algebras |38| that is not multigroup or 
conditionally multigroup decodable, the GDL decoder gives 
roughly a 12 times reduction in complexity compared to the 
CML decoder. 

Example 6: Consider the following 2 antenna code ob- 
tained from a Cyclic Division Algebra [[38l 

si + js2 + 7(53 + jsi) 6 (s5 + jse - 7(57 + jss)) 

_S5+jS6+j{s7+jSs) Si +iS2 - 7(53 +iS4) J' 

where 7 = e^^, and 5 is any complex number which is 
transcendental over the field Q{y/j)- The complex symbols 
S2n-i+js2n, n = 1 , . . . , 4, are encoded using the 8-PSK 
signal set. For this code, there are = 4 encoding groups, 
x„ = [s2n-i S2n] for 71 = 1 , . . . , 4, t = 2 and g = ^8. All 
the four symbol groups are mutually interfering, and hence this 
STBC is fully-interfering. From dTUl), the CML decoder for this 
code involves 507, 903 mathematical operations. On the other 
hand, using (fTTI) . we see that the GDL decoder involves only 
26, 718 operations, which is about 19 times less than the CML 
complexity. ■ 
Example 7: Consider the following Field Extension 
code [38] for rit — 3 transmit antennas 

Sl+jS2 7(s5+iS6) 7(53 +jS4) 
S3+jS4 Si+jS2 l{s5+jS6) , 
_S5 + jS6 S3 + jS4 Si + jS2 

where j = and the complex symbols S2n-i+is2n, 

n = 1,...,3 are encoded using the 8-PSK signal set. This 
code has = 3 encoding groups, x„ ~ [s2n-i S2n] 
for n = 1, . . . , 3, t = 2 and q ^ VS. This STBC is fully- 
interfering, and the CML and the GDL decoders for 
this code involve 38, 399 and 2, 758 operations respec- 
tively. Thus the GDL decoder provides a complexity re- 



duction of the factor of 14 compared to the CML 
decoder. ■ 

The number of computations involved in the GDL decoder 
is less than that of the CML decoder not just for fully- 
interfering codes, but for any STBC. 

Theorem 3: Let C be any STBC such that the number of 
real symbols per each encoding group of C is same, and the 
signal set size for each of the encoding groups is equal. Then 
Cgdl(C) < Ccml(C). 

Proof: Proof is given in Appendix ID] ■ 

From Theorem |2] and Example T.4, we see that the GDL 
algorithm can provide improvements over CML decoders in 
terms of the order of ML decoding complexity as well. 

C. Reduction in complexity with PAM signal sets 

When a real symbol is encoded using a PAM signal set, 
the optimal value of that variable, conditioned on the values 
of other information symbols, can be found by scaling and 
hard-limiting. This technique has been widely used in the 
literature lUH, |j20l, 1261, 1291, and can lead to gains in the 
order of the CML decoding complexity. In this subsection we 
show that such a reduction in complexity is possible with GDL 
as well. 

We will now describe how a variable x„q, uq G {1, . . . , N}, 
(not necessarily a PAM encoded single real symbol) can be 
removed from the GDL formulation. The global metric (3 can 
be split into terms involving x„q and terms not involving x„q 

as 

m6Af(no) 
n^UQ n<m 

where Af{no) is the set of indices of those variables that are 
neighbors of x„„ in the moral graph of the code. Define the 
functions 

^na{^M(no)) — Q^no (^no ) + ^ ^ Q!no ,m (XriQ : ) ; 

meA/'(no) 

/3'(x{„o}0 = min^(xi,...,XAr). 
Then we have /3'(xj„jj}c) — 

hno{.'^jV(no)) + ^ O^ni'^n) + ^ a„,m(x„,X„), 

and the ML solution for x„, n ^ no, 

^{noV = argmin/3'(x{„Q}c). 

Given the function hno{^j\f{no))^ the ML decoding of C is 
equivalent to minimizing /?'. This minimization can be solved 
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Cgdl(C) = q 



Nt 



N 



N 



(2i-l) + 7V + l 
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{2f -t) + N{f + 3<) 



(11) 



using the GDL. If the function /i„(, can be computed with 
sufficiently low complexity, using rather than j5 to ML 
decode C can lead to gains in the decoding complexity. 

As we show now, when x^^ is a q-ary PAM encoded single 
real symbol, /i„q can be computed with reduced complexity 
using scaling and hard-limiting. For each -Xj^f^na) S -^no> 



= min 



mGA/'(no) 



= min ^„ 



where C = C«o + E 
value x„o that minimizes hn„ 



2fno 



4£2 



^no. iSi- The optimal 
for a given value of x^(„|-j) can 
be found by the scaling and hard-limiting step given in ( fT2] i 
at the top of the next page, where rnd{ ) is the nearest integer 
function. This step has a constant complexity independent of 
q. The value of hn^ can then be calculated as 



C 



26, 



4£2 



(13) 

We now use GDL to compute /i„o itself. From ( fTsT l, we see 
that the function depends on x^(„„) only through 



,.(xmj, where. 



Now 



7V== {mi,...,mp} andtJ™^(x™J = Eje^(m,) ^"o^^Sj 
consider the junction tree for this problem shown in Fig. [19] 
where the local kernel at the central vertex is £,na.na, and the 
local kernel at the vertex (x^ ) is Wm ■ It is straightforward 
to show that C is equal to the state of the central vertex of 
Fig. [19] at the end of the single- vertex GDL schedule rooted at 
this node. Using the table of values of C thus obtained, one can 
then compute /i„„ using ([T2l i and ( [T3] l. Thus, the function hn^ 
can be computed with order of complexity |-47v'(no)| instead 
of the brute-force complexity order (z|^a^(„o)|. 

\f Q = {y,£) is a junction tree for /3, and Q'iV' ^E') is a 
junction tree for /?', such that 

max \ Ai ^, I < max \ Ai^ \ and 

\^Mino) \ < maxl^ij, 

then ML decoding the code using the junction tree Q' provides 
an improvement in the complexity order compared to using the 
junction tree Q. 

Lemma 6: If the core T of Q has only one vertex containing 
the variable x„q, then the tree T' obtained by removing x„q 
from this vertex of T is a core for the GDL minimization of 




Fig. 19. A junction tree to compute 



Proof: We will show that T' satisfies both the conditions 
of Definition [2] for minimizing /?'. Since T satisfies the 
junction tree condition for all the variables x„, n = 1, . . . , N, 
the tree T', obtained by removing the only occurrence of x„q, 
satisfies the junction tree condition for x„o, n ^ uq. For every 
n, m uq there exists a u e V such that {n, m} C I^,, and 
hence there exists a v' £ V' such that {n,m} C 1^,. Suppose 
Wo € V is the only vertex of Q that contains x„(,. Because T 
is a core for the minimization of (3, Af{nQ) C ly^ and hence, 
this vertex in T' contains the argument of hno as a subset 
of its local domain. Therefore, T' can be used as a core for 
minimizing beta'. ■ 

This technique of removing a PAM encoded variable can 
be generalized to any set TZ C {!,... ,N} of variables that 
satisfies the condition given in Lemma [7] below. In this case, 
the variables x„, n e TZ, are removed one by one from 
the GDL formulation, in an arbitrary order, using the same 
technique as above. 

Lemma 7: The PAM encoded set of variables xtj can be 
removed from the GDL formulation using scaling and hard- 
limiting if and only if the subgraph of the moral graph 
generated by these variables is edgeless. 

Proof: Let TZ = {ni, . . . , and let the chosen 

order of removal be ni, n2, . . . , n\Tz\- The variable x„j can 
be removed using the technique described in this subsection, 
irrespective of the choice of n2^ ■ ■ ■ , n^-j^^. Suppose there exists 
an fir E TZ, such that rir G J\f{ni). Then, while removing x„^, 
one is faced with the minimization of the function 



)) + a„,,(x„J 



over the variable x„^ . However, hn^ is not a quadratic function 
of x„^, and hence minimization of the above expression 
via completion of squares, scaling and hard-limiting is not 
possible. On the other hand, when ^ JV{ni), this step of 
minimizing /i„j does not arise during the removal of x„^ from 
the GDL formulation, and hence x„^ can be removed using 
scaling and hard-limiting. ■ 
For example, when a conditionally g-group decodable code 
is to be decoded, one PAM encoded symbol from each of the 
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x„„ ^ min i m..irnd ('i^ - ^) ,o} - 1 - (12) 




Fig. 20. A junction tree core T to decode the Golden Code. 




Fig. 21. A junction tree core 7~' for tlie Golden code that exploits the 
structure of PAM signal set. 

g conditional groups can be removed via scaling and hard- 
limiting. 

Example G.4: Consider the junction tree core T for the 
Golden code shown in Fig. |20] From Lemma [T] and the 
moral graph of the Golden code given in Fig. [TTl we see 
that the variables xi and X2 can be removed using scaling 
and hard-limiting. Using Lemma |6] we get the junction tree 
core V = iy',£') shown in Fig. |2l] Since |7Vi| = jAAj] = 5, 
the functions hi and h2 can be computed with complexity 
order q^, where q is the size of the PAM signal set used to 
encode the information symbols. Also, max.i;/gv' , \ — 
and hence the single-vertex GDL schedule and traceback can 
be implemented with order of complexity q^ . Hence, the order 
of complexity for GDL decoding of the Golden code using 
T' is q^, whereas the complexity order of using T is q^ . The 
removal of the variables xi and X2 has enabled the reduction 
of the GDL complexity order from q^ to q^ . The total number 
of mathematical operations involved in the GDL decoding of 
the Golden code using V is A2q^ + Qq^ + 21q^ + b2q - 5. 
The CML decoder JTS), on the other hand, involves 

76(7^ + AZq^ — 1 operations. Comparing the leading order 
terms, we see that the GDL decoder is about 1.8 times as 
fast as the CML decoder. For instance, when g = 2 or 4 
(corresponding to the rates 4 and 8 bits per channel use), the 
GDL decoder gives a complexity reduction of 1.9 compared 
to the CML decoding algorithm. 

On the other hand, consider the naive choice of symbol 
groups 

Xl = [si S2 S3 Sif , X2 = [S5 Sq S7 Ss]'^, 

given in Example G.l. The signal set size for each of these 
two symbol groups is q"^. Since the two symbol groups 
are interfering, any choice of junction tree Q — {V,£) must 
involve a vertex uo that contains both the variables Xi, X2. The 
GDL single-vertex decoding complexity has the complexity 
order max^^y > q^, which is equal to the order of brute- 

force ML decoding complexity. ■ 

VI. Conclusion 

The CML decoding algorithm minimizes the ML metric 
/3(xi, . . . jXat) via removing a subset of variables from the 



problem formulation by minimizing /3 for each instantia- 
tion of this subset of variables. This subset of variables is 
chosen in such a way that the reduced problem, obtained 
after their removal from f3, splits into multiple, independent, 
less complex minimization problems. The GDL, on the other 
hand, computes various partial sums and marginalizations of 
P involving the 'smaller', less complex functions a„, an.m, 
and utilizes these intermediate functions to efficiently arrive 
at the ML solution. In this paper, we have introduced this 
GDL based ML decoding framework, and shown that the 
GDL decoder is superior to the CML decoder in terms of 
complexity. The results of this paper have brought to light the 
following relevant problems that need to be addressed. 

• Proving the optimality or otherwise of GDL based de- 
coders in minimizing the complexity of ML decoding an 
STBC. 

• Given an STBC C, finding the optimal choice of weight 
matrices, encoding groups and signal sets, which will 
minimize the GDL decoding complexity of the code. 

• Constructing codes with better rate-decoding complexity 
tradeoff than that of the known codes using the GDL 
decoders. 

• Both GDL and CML decoding algorithms depend on 
the Hurwitz-Radon orthogonality of weight matrices to 
obtain low complexity ML decoders. Is there any other 
algebraic property of a code that can be exploited to de- 
sign low complexity ML decoders? Can it lead to further 
improvement in the rate-decoding complexity tradeoff? 
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Appendix A 
Proof of Theorem[T] 

First we will show that x(t?i), . . . ,x{Gg) is a partition of 
{xi, . . . ,XAr}. It is clear that Uf^^;^x(C7fe) = {xi, . . . , xw}. 
Enough to show that for any £ k, x{Q^) O x{Qk) ~ <t>- 
Suppose this is not true. There exists a variable x„ that 
appears in the local domains of at least one of the vertices 
in each of Qe and Qk- Since Q satisfies the junction tree 
condition, the local domains of all the vertices on the unique 
path between these two vertices in Q contain the variable x„. 
Further, this unique path contains at least one of the edges 
[uk, I'k), k = 1, . . . , (g — 1). Thus, there exists a k such that 
luf. nlt,^ 3 {n}, and hence HX^^ ^ (f), a contradiction. 
Thus x{Qi), . . . , x{Qg) is a partition of {xi, . . . , xat}. 

We will now show that for each k = 1,. . . ,g, the tree Qk 
satisfies the junction tree condition. Let x„ be any variable 
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from the set x{Qk)- From the first resuh of this theorem, 
x„ appears in the local domains of the vertices of Qk only. 
Thus the subgraph of Q formed by vertices containing x„ is a 
subgraph of Qk- Since Q satisfies the junction tree condition, 
this subgraph is a connected graph. Hence Qk satisfies the 
junction tree condition. 

We will now prove the last part of the theorem. Since 
x{Gi), ■ ■ ■ ,^{Gg) is a partition of {xi, . . . ,xjv}, none of the 
local domains of Q involve any cross terms between x(tjf) 
and x{Qk) for any £ ^ k. Therefore the global kernel (3 can 
be written as 



/3(xi 



/l(x(gi))+/2(x(g2)) 



where, for £ = l,...,g, fg{x{Qi)) is the sum of the local 
kernels of all the vertices of Qg. Let v be any vertex of Q 
and let it belong to the k*^ subtree of Q. Let (t„ be the state 
of the vertex v after running the GDL all-vertex message- 
passing algorithm on Q, and (t(, be the state of the vertex 
after running the GDL all-vertex message-passing algorithm 
on Qk only. From the discussion in Section III-BI (t„ is the 
xx^ -marginalization of /3, and (t(, is the xx„-marginalization 
of fk- We have 



9 

min/3 = miny^ /£(x (Gi)) . 



Since each of is a function of disjoint sets of 

variables, the min and the summation in the above equa- 
tion can be interchanged. Observing that for all £ ^ k, 

xjc n x{Qg) = x{Qi), we have 



cri;(xiJ=V' min ff{x{gi)) 

= min fk (x(tjfc)) + min fi {-^.{gi)) 



= a;(xij + 



where a( denotes the real number minx(g;^) fe {x{Qi)). Thus, 
for any vertex v of G, the functions (T„ and (t(, differ only by 
a scalar. Therefore the solution to xj^, obtained from a[, is 

argmin(T^(xiJ = argmin cr„(xij - ^ 

= argmincr„(xi^,), 

which is the solution obtained from a^, and 
hence is the ML solution. This completes the 
proof. ■ 

Appendix B 
Proof of Theorem[2] 

In order to prove this theorem we categorize all STBCs 
into three classes: (i) multigroup decodable, (ii) conditionally 
multigroup decodable, and ( Hi) codes in which all the symbols 
are mutually interfering, which we will call fully-interfering 
STBCs. For g-group decodable codes the CML decoder splits 




Fig. 22. 
code. 



Moral graph of the smallest conditionally multigroup decodable 




Fig. 23. 
code. 



A junction tree for the smallest conditionally multigroup decodable 



into g independent CML decoders, one for each of the g sub- 
codes. Note that each subcode itself can be either conditionally 
multigroup decodable or fully-interfering. For multigroup and 
conditionally multigroup decodable codes Ocml < For 
fully-interfering codes CML reduces to brute-force decoding 
and hence Ocml = 

For each of the three classes of codes we now show that 
Cgdl < CcML- For a fully-interfering STBC the junction tree 
in Section IIII-BI can be used. The complexity of this GDL 
decoder is of the order of \C\ = Ccml(C). Since this decoder 
is only one instance of (possibly) several GDL algorithms for 
ML decoding this code, we have C'gdl(C) < C'cml(C). 

Now consider a g-group decodable code. The complexity 
of a CML decoder is sum of the CML complexities of the g 
subcodes. As explained in Section lTV-BI this code can be GDL 
decoded using a disjoint union of g junction trees, one tree 
corresponding to each of the g subcodes. Thus, the complexity 
of GDL decoding is sum of the complexities of GDL decoding 
each of the g subcodes. Since the subcodes can be either 
conditionally multigroup decodable or fully-interfering, we 
only need to show that the theorem is true for conditionally 
multigroup decodable codes and fully-interfering codes in 
order to prove the theorem for g-group decodable codes. We 
have already proved the result for fully-interfering codes. In 
the remaining part of the proof we show that Ogdl < Ccml 
for all conditionally multigroup decodable codes. 

The proof for conditionally multigroup decodable codes is 
via induction on N, the number of encoding groups of the 
STBC. The smallest N for which such a code exists is 3 
and its corresponding moral graph is shown in Fig. |22l The 
conditional ML decoder for this code operates with T = {1,2} 
and its complexity order is {Asl max{|^i|, |-42|}- To decode 
this code using GDL we can use the junction tree given in 
Fig. |23] The complexity order of this junction tree equals 
l^al max{|^i|, \A2\} = Ocml(C). Thus we have shown that 
Ogdl < Ocml for iV = 3. 

We now prove the induction step. Assume that the theorem 
is true for all conditionally multigroup decodable codes for 
which the number of encoding groups is less than N. We will 
now show that the result is true when the number of encoding 
groups is N as well. Consider a CML decoder with complexity 
order Ocml(C) for a code C with N variables. Suppose this 
decoder uses F C {!,..., N}. Let the subcode generated by 
xr be 5-group decodable, i.e., let C be conditionally g-group 
decodable for this choice of F. If the g conditional groups are 
Fi, . . . , Fg, then the complexity order of this CML decoder is 
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Fig. 24. The tree T in the proof of Theorem |2] 



Ccml(C) — \Av'' I max^^^ CcML(Ck), where Cu is the subcode 
generated by the variables {x„|7i G Ffc}. To complete the 
proof of this theorem it is enough to construct a junction tree 
for this code whose complexity order is at the most OcmlCC). 
For k — 1 , . . . , the code Cfc is either fully-interfering 
or conditionally multigroup decodable, and the number of 
encoding groups in Ck is less than iV. Then there exists a GDL 
decoder for Cfc whose complexity order is upper bounded by 
CcML(Ck). Let Tfc denote the junction tree core for this GDL 
decoder. Construct a tree T^, from Tfe by appending the variable 
list xr^ to the local domain of every vertex of Tfe. We now 
construct a core T using T^, . . . , and one additional vertex 
with local domain xr^. For every fc ~ l,...,g, arbitrarily 
choose a vertex of and connect it to the xr^ vertex using 
a single edge. 

It is straight forward to prove that T is a valid junction tree 
core for ML decoding of the STBC C. For every vertex in 
the local domain size is upper bounded by |ylr= |CGDL(Ck). 
Therefore, 

Ogdl(C) < maxl^r-l Ogdl{Cw) 

k—1 

; mlxl^r^ OcML(Ck) = Ocml(C). 

k—l 



< 



This completes the proof. 



Appendix C 

The CML and GDL decoding complexities of 

FULLY-INTERFERING STBCS 

The CML algorithm for a fully-interfering code reduces to 
a brute-force search 

(si, . . .,SNt) = argmin/(si, . . .,SNt) 

Nt 

= argmin^ {s^^i + s1(,i^i) + ^ s,:Sj6j- 

i—l i<j 

For each of the q^* values that (si, . . . , s^t) jointly assume, 
there are Nt terms of type Si^i + sf£,i^i to be computed, and 
each such term involves 4 operations. There are (^*) terms of 
the type SiSj^ij and each term involves 2 operations. Taking 
into account the process of summing up these individual terms, 
the total number of operations in computing / for a given 
{si,...,SNt) is 3(^*)+5A^t-L Finding arg min of the 
resulting g^* values of / takes further (g^* — 1) operations. 
Thus, the CML decoding complexity is 



-CML 



-Nt 



Nt 



5Nt 



1. 



The GDL decoding of C involves three steps: computing 
the kernels a„, a„.m, running the GDL message-passing 
algorithm, and finally the traceback. We use the junction tree 



of Fig. [U to decode this STBC. There are N kernels of the 
type Q!„(x„). Using the distributive law, q:„ can be expressed 
in terms of {si} as 



q;„(x„) = ^ + Si^i^i) + ^ s,: 



J 



where il;{n) is the set of indices of {si} that belong to 
the n*'' encoding group. The computation of a„ using the 
above expression involves + 3t) operations. There are 
(^) kernels of the type an.m- Again, with the help of the 
distributive law, we rewrite q;„ m as 




(14) 



The tq* values of the term J2je^{m) ^j^i-j' '^^^ ^'^^ each pair of 
{i, x„j) are precomputed, and then these values are used in (fl4] l 
to compute a,,.,,,. This two step method provides complexity 
reduction compared to the direct computation of a„^„i, and can 
be implemented with (j^*(2t — 1) + g*(2t^ — operations. 
Using 0, we see that implementing the GDL message-passing 
schedule takes up g^* (^) + q^^N operations. Note that the 
highest order term appearing so far is g^*. The root vertex 
for the single-vertex GDL and traceback must therefore be 
chosen in such a way that the complexity of this last step 
does not contribute to the g^* term. Choosing any vertex of 
the type (x„,x„i) will satisfy this requirement as it leads to a 
traceback complexity of q(^^2)t _j_ q2t _ 2 Summing up the 
individual terms, we have the expression for Cgdl(C) given 

in dnii. 

Appendix D 
Proof of Theorem[3] 

The proof of Theorem[3]is similar to the proof of Theorem|2] 
given in Appendix |B] Here too, we consider three cases: 
(i) multigroup decodable codes, (ii) conditionally multigroup 
decodable codes, and (Hi) fully interfering codes. From the 
discussion in Appendix [B] we see that it is enough to prove 
the theorem for fully-interfering codes and conditionally multi- 
group decodable codes. In Appendix |C] we have derived the 
GDL and CML complexities of fully-interfering codes, and the 
comparison of their leading order terms shows that for such 
codes Cgdl(C) < Ccml(C). 

We now prove the result for conditionally multigroup de- 
codable codes by induction on N. The smallest such code 
involves = 3 encoding groups, and its moral graph is shown 
in Fig. |22] The CML decoder minimizes 

P = Q;3(x3) + ai,3(xi,X3)+a2,3(x2,X3) + ai(xi)+a2(x2), 

by conditioning on X3. For each of the g* values of E A3, 
the CML decoder computes the scalar 03(33) and the func- 
tions ai^3(xi,a3), a2,3(x2,a3). It then independently mini- 
mizes ai^3(xi,a3) +ai(xi) and a2,3(x2,a3) -|-a2(x2), and 
finds the conditionally optimal values Xi(a3) and X2(a3). 
From the g* resulting values of /3(xi(x3), X2(x3), X3), the 
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optimal solution is obtained. The complexity of this algorithm 
can be shown to be 



,2t 



(3^2 + 7t) +q' [U^ + 3[ ] +5t 



1. 



The GDL decoder can be implemented on the junction tree 
shown in Fig. |23] The GDL complexity involves the cost of 
computing the kernels a„, n = 1,2, 3, ai_3 and 02.3, running 
the single-vertex GDL schedule with root vertex (X3), and the 
traceback to find the optimal solution. The complexity of this 
algorithm is 

(4i + 2) + {7t^ + 7i + 3) - 3. 

Comparing the leading terms, we see that the GDL is less 
complex than the CML decoder Hence the theorem is true 
for iV = 3. 

Now consider any conditionally multigroup decodable code 
with > 4 encoding groups, and assume that the theorem is 
true for all codes with number of encoding groups less than N. 
Assume that the variables corresponding to F C {1, . . . , N} 
are g-gmup decodable conditioned on the variable list xr^ . If 
the g conditional groups are Fi, . . . , Fg, the ML metric /3 can 
be expressed as 



an 



E 

k=l 



E 

.mGr" 



E E 



The CML decoder proceeds as follows. For each of the g''^ '* 
values (a„|ri G F"^) £ At" that the variable list xr^ jointly 
assumes, the CML decoder computes the scalar 

^ ^ a„(a„) + ^ ^ a„.m(a„,am), 



and the functions a„^m(x„, a™) for each n G F and m G F^. 
It then minimizes the metric 



E 



an,m(Xn,am)+ ^ Q„(x„)+ ^ a„,m(x„,Xm) 



n<m 



by multigroup decoding. Minimizing each of the terms corre- 
sponding to k = 1, . . . ,g in the above equation is equivalent 
to decoding the code Ck generated by xr^, by its own CML 
decoder, and hence each of these terms can be minimized with 
complexity CcML(Ck). Thus, corresponding to each ar^ G Ar" 
we have a list x„(ar=), n e F of conditionally-optimal so- 
lutions. Finally, from the q^^"^* values of /3(xr(xrc), xr^), 
the optimal tuple (xrCxr^:), xr^) is chosen. The number of 
operations involved in this algorithm is given in (flSl l at the 
top of the next page. Note that the contribution to the leading 
order term of Ccml(C) comes from q^^ CcML(Ck) 

Let Gi, . . . ,Gg be the junction trees for Ci 
minimal decoding complexities. Since the number of encoding 
groups in each of the codes Ck is less than N, the result of this 
theorem is true for these codes, i.e., CGDL(Ck) < CcML(Ck), for 



. , Cg with 




Fig. 25. The tree T in the proof of Theorem |3] 



k = 1,. . . ,g. We now construct a junction tree for C using 
Gi, . . . ,Gg. For each k = 1, . . . ,g, append the variable list 
xrc to each of the vertices of Gk and set all the local kernels 
to zero. From this resulting tree G'j, arbitrarily choose a vertex 
of type (x„,xr<:), n e F^ and connect it to an exterior (xr^) 
vertex by a single edge, as shown in Fig. |25] Set the local 
kernel at (xr^) to zero as well. We now use this tree as the 
core for the STBC C. For each n,m £ F^, assign the kernel 
ctn,m to the vertex (x„,x,„,xrc) of G').. For every n G F^, 
assign the kernel a„ to the vertex (x„,xrc) of G'^,. For each 
pair n e Ffc and m e F^, attach a new vertex (x„,x„i) with 
kernel a,i^„i to the vertex (x„,xrc) of GJ. by a single edge. 
Attach all the vertices of the type (x„,x„i), n,rn E F^, with 
kernel a„.m, and all the vertices (x„), n E F^, with kernel a„, 
to the (xr<:) vertex using single edges. It is straightforward to 
show that this resulting tree Q = {V,£) is a junction tree for 
C. 

If each of the codes Ck, k = l,...,g, consists of just 
one encoding group each, then every Gk will consist of 
just one vertex, and a direct calculation of the number 
operations involved in GDL decoding using Q shows that 
Cgdl(C) < Ccml(C). If otherwise, then there exists at least 
one component Gk with two or more encoding groups. Define 
s = maxijgv \^v\- Since there is at least one pair of inter- 
fering symbols in F, we have s > 2 + jF"^!. Let S be the 
set of 'largest' vertices in Q, i.e., S ~ {v E V\ \Iv\ = s}. 
Now consider the contribution of each of the three steps: 
computation of kernels q;„ & an.m, running the single-vertex 
GDL schedule with root (xr^), and traceback, to the leading 
term of Cgdl(C). The kernels can be computed with the 
order of complexity g^*. The complexity of the GDL single- 
vertex schedule is of the order of g**, and the traceback 
implementation requires a complexity order less than g**. 
Since s > 2 + |F^|, the only contribution to the leading order 
term comes from the GDL single-vertex schedule. Recall that 
Cgdl(C) = E(.,„)e£ (l-^iJ + - l^x„nij). The con- 
tribution to the leading order term of Cgdl(C) comes from 
the set of all the edges in £ that are incident on the vertices 
belonging to S. Clearly, every v E S belongs to one of the 
GJj,, corresponding to a subcode with two or more encoding 
groups. From the construction of Q, we see that the degree 
and the edges associated with any vertex from 5 in are 
same as the degree and the edges associated with that vertex 
in the corresponding junction tree Gk- It is exactly this set of 
edges in each Gk that contribute to the leading order terms of 
CGDL(Ck). Since Q is only one of the many possible junction 
trees for C, we have Cgdl(C) < ^GOiiCk), up to 

the leading order term. From (fTSI l and the assumption made for 
induction that CGDL(Ck) < CcML(Ck), fc = 1, . . . , g, we have 
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Ccml(C) = q 



|r=it 



fc=i ^ 



+ bir'' \t + 2Nt + g -1 



(15) 



CGDL(C)<gl^^l*^CGDL(Ck) 



a 



<(7l^°l*^CcML(Ck) <Ccml(C). 



fe=i 



This completes the proof. 
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